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Abstract 

In this paper, we propose a linear complexity encoding method for arbitrary LDPC codes. We 
start from a simple graph-based encoding method "label-and-decide." We prove that the "label-and- 
decide" method is applicable to Tanner graphs with a hierarchical structure — pseudo-trees — and that 
the resulting encoding complexity is linear with the code block length. Next, we define a second type of 
Tanner graphs — the encoding stopping set. The encoding stopping set is encoded in linear complexity by 
a revised label-and-decide algorithm — the "label-decide-recompute." Finally, we prove that any Tanner 
graph can be partitioned into encoding stopping sets and pseudo-trees. By encoding each encoding 
stopping set or pseudo-tree sequentially, we develop a linear complexity encoding method for general 
LDPC codes where the encoding complexity is proved to be less than 4 • M ■ (fc — 1), where M is the 
number of independent rows in the parity check matrix and k represents the mean row weight of the 
parity check matrix. 
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I. Introduction 

Low Density Parity Check (LDPC) codes [1] are excellent error correcting codes with perfor- 
mance close to the Shannon Capacity [2]. The key weakness of LDPC codes is their apparently 
high encoding complexity. The conventional way to encode LDPC codes is to multiply the data 
words ~s by the code generator matrix G, i.e., the code words are Ic = G ■~s. Though the 
parity-check matrix H for LDPC codes is sparse, the associated generator matrix G is not. The 
encoding complexity of LDPC codes is O(n^) where n is the block length of the LDPC code. For 
moderate to high code block length n, this quadratic behavior is very significant and it severely 
affects the application of LDPC codes. For example, LDPC codes have advantages over turbo 
codes [3] in almost every aspect except that LDPC codes have 0{n^) encoding complexity, while 
turbo codes have 0{n) encoding complexity. It is highly desirable to reduce the 0{n'^) encoding 
complexity of LDPC codes. 

Several authors have addressed the issue of speeding encoding of LDPC codes and, largely 
speaking, they follow three different paths. The first path designs efficient encoding methods for 
particular types of LDPC codes. We list a few typical representers. Reference [4] proposes a linear 
complexity encoding method for cycle codes — LDPC codes with column weight 2. Reference [5] 
presents an efficient encoder for quasi-cyclic LDPC codes. In [6], an efficient encoding approach 
is proposed for Reed-Solomon-type array codes. Reference [7] shows that there exists a linear 
time encoder for turbo- structured LDPC codes. Reference [8] constructs LDPC codes based 
on finite geometries and proves that this type of structured LDPC codes can be encoded in 
linear time. In [9], [11], two families of irregular LDPC Codes with cyclic structure and low 
encoding complexity are designed. In addition, an approximately lower triangular ensemble of 
LDPC Codes [10] was proposed to facilitate almost linear complexity encoding. The above low 
complexity encoders are only applicable to a small subset of LDPC codes, and some of the LDPC 
codes discussed above have performance loss when compared to randomly constructed LDPC 
codes. The second path borrows the decoder architecture and encodes LDPC codes iteratively 
on their Tanner graphs [12], [13]. The iterative LDPC encoding algorithm is easy to implement. 
However, there is no guarantee that iterative encoding will successfully get the codeword. In 
particular, the iterative encoding method will get trapped at the stopping set. The third path 
utilizes the sparseness of the parity check matrix to design a low complexity encoder. In [14], 
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the authors present an algorithm named "greedy search" that reduces the coefficient of the 
quadratic term. This encoding method is relatively efficient. Its computation complexity and 
matrix storage need to be further reduced for most practical applications. 

In this paper, we develop an exact linear complexity encoding method for arbitrary LDPC 
codes. We start from two particular Tanner graph structures — "pseudo-tree" and "encoding 
stopping set" — and prove that both the pseudo-tree and the encoding stopping set LDPC codes 
can be encoded in linear time. Next, we prove that any LDPC code with maximum column weight 
three can be decomposed into pseudo-trees and encoding stopping sets. Therefore, LDPC codes 
with maximum column weight three can be encoded in linear time and the encoding complexity 
is no more than 2 - M ■ (k — l) where M denotes the number of independent rows of the parity 
check matrix and k represents the average row weight. Finally, we extend the 0{n) complexity 
encoder to LDPC codes with arbitrary row weight distributions and column weight distributions. 
For arbitrary LDPC codes, we achieve 0{n) encoding complexity, not exceeding A - M ■ (k — l). 

The remaining of the paper is organized as follows. In Section II, we introduce relevant 
definitions and notation. Section III proposes a simple encoding algorithm "label-and-decide" 
that directly encodes an LDPC code on its Tanner graph. Section IV presents a particular 
type of Tanner graph with multi-layers — "pseudo-tree" and proves that any pseudo-tree can 
be encoded successfully in linear time by the label-and-decide algorithm. Section V studies the 
complement of the pseudo-tree — "encoding stopping set." Section VI proves that the encoding 
stopping set can also be encoded in linear time by an encoding method named "label-decide- 
recompute." Section VII demonstrates that any LDPC code with column weight at most three 
can be decomposed into pseudo-trees and encoding stopping sets. By encoding each pseudo-tree 
or encoding stopping set sequentially using the label-and-decide or the label-decide-recompute 
algorithms, we achieve linear complexity encoding for LDPC codes with maximum column 
weight three. Finally, we extend in this Section this linear time encoding method to LDPC 
codes with arbitrary column weight distributions and row weight distributions. Section VIII 
concludes the paper. 

II. Notation 

LDPC codes. LDPC codes can be described by their parity-check matrix or their associated 
Tanner graph [15]. In the Tanner graph, each bit becomes a bit node and each parity-check 
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constraint becomes a check node. If a bit is involved in a parity-check constraint, there is an 
edge connecting the bit node and the corresponding check node. The degree of a check node in 
a Tanner graph is equivalent to the number of one's in the corresponding row of the parity check 
matrix, or, in another words, the row weight of the corresponding row. We will use the term 
"degree of a check node" and "row weight" interchangeably in this paper. Similarly, the degree 
of a bit node in a Tanner graph is equivalent to the column weight of the corresponding column 
of the parity check matrix, and we will interchangeably use the term "degree of a bit node" and 
"column weight" in this paper. The LDPC codes discussed in this paper may be irregular, i.e., 
different columns of the parity check matrix have different column weights and different rows 
of the parity check matrix have different row weights. The parity check matrix of an LDPC code 
may not be of full rank. If a row in the parity check matrix can be written as the binary sums 
of some other rows in the parity check matrix, this row is said to be dependent on the other 
rows. Otherwise, it is an independent row. 

Arithmetic over the binary field. We represent by "©" the summation over the binary field, 
i.e., an XOR operation. For example, 0©1 = mod (0+1 , 2) = 1. Similarly, we have 0©0 — 0, 
1 ® = 1, 1 ® 1 = 0. In addition, we have the following equation —x = a; in the binary field. 

Generalized parity check equation. A conventional parity check equation is shown in (1). 
The right-hand side of the parity check equation is always 0. 

a^i © X2 © . . . © Xfe = (1) 

In this paper, we define the generalized parity check equation, as shown in (2) 

xi®X2® ...®Xk^h (2) 

On the right-hand side of equation (2), 6 is a constant that can be either or 1. 

Let C be a standard parity check equation. If the values of some of the bits in the left-hand 
side of C are already known, then C can be equivalently rewritten as a generalized parity check 
equation. For example, if the values of the bits Xp+i, a;p+2, ■ ■ ■ ,Xk are known, we move these 
bits from the left-hand side of equation (1) to its right-hand side and rewrite it as follows. 

Xi © X2 © . . . © Xp — Xp^i © Xpj^2 © • • • © a; J; = h (3) 
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Let Ci, C2, . . ., Cp he p generalized parity check equations, as shown in (4) 

Ci : e Xi^2 ® • • • Xi^ai = h 

C2 : X2,l ® X2,2 ® ••• X2,a2 ^ h 

(4) 

Cp '. Xp^i © Xp^2 © • • • Xp^ap = bp 

We say Ci, C2, ■ ■ ., Cp are dependent on each other if the corresponding homogeneous 
equations in (5) are dependent on each other. 

Xl,l ® Xi^2 ® ••• Xi^ai = 
X2,l ® X2,2 ® ••• X2,a2 = 

Xp^i © a:p^2 © • • • Xp^ap = 
From equations (4) and (5), we derive that 

61 © 62 © • • • © &p = (6) 

when the p generaUzed parity check equations Ci, C2, ■ ■ ■, Cp are dependent on each other. 

Connected graph. A graph is connected if there exists a path from any vertex to any other 
vertices in the graph. If a graph is not connected, we call it a disjoint graph. 

Relative complement of a subgraph 5 in a Tanner graph Q. Let ^ be a Tanner graph 
and 5 be a subgraph of Q, i.e., S C G- We use the symbol Q\S to denote the subgraph that 
contains the nodes and edges in Q, but not in S. For example, let Ci, C2, . . ., Ck he k check 
nodes in a Tanner graph Q. The subgraph ^\{Ci, C2, . . . , C^} represents the remaining graph 
after deleting check nodes Ci, C2, ■ ■ ■, Ck from Q. Assume Qi, Q2, . . ., Gk ^me k subgraphs in a 
Tanner graph Q. The notation U ^2 U . . . U Gk} represents the subgraph where nodes and 

edges are in Q, but not in ^j, 1 < i < k. 

III. "Label-and-decide" Encoding Algorithm 

Initially, Tanner graphs [15] were developed to explain the decoding process for LDPC codes; 
in fact, they can be used for the encoding of LDPC codes as well [12]. To encode an LDPC code 
using its Tanner graph, we identify information bits and parity bits through a labeling process 
on the graph. After determining the information bits and the parity bits, we start by assigning 
numerical values to the bit nodes labeled as information bits and then in a second step, calculate 
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the missing values of the parity bits sequentially. This encoding approach is named label-and- 
decide. It is described in Algorithm 1. 

Algorithm 1 Label-and-decide algorithm 
Preprocessing (carry out only once): 

Label every bit node either as information bit or parity bit on the Tanner graph. 
Encoding: 

Flag ^ 0; 

Get the values of all the bits labeled as information bits; 
while there are parity bits undetermined do 

if there exists one undetermined parity bit x that can be uniquely computed from the values 
of the information bits and the already determined parity bits then 

Compute the value of x. 
else 

Flag <— 1, exit the while loop. 

end if 
end while 
if Flag = 1 then 

Encoding is unsuccessful, 
else 

Output the encoded codeword, 
end if 



Example. Figure 1 shows on the left an LDPC code whose Tanner graph is a tree. Initially, all 
its bit nodes are unlabeled. First, we randomly pick bit nodes xi, x^, and x^, to be information 
bits. According to the parity check equation C\, the value of bit X4 depends on the values of the 
bits x\, X2, and xs such that X4 — xi® X2® X3. Therefore, X4 should be labeled as a parity bit. 
Similarly, we label bits X5, xq, xs, xg as information bits and label bits X7, xio as parity bits. We 
represent information bits by solid circles and parity bits by empty circles. The labeling result 
is shown on the right in Figure 1. 

By the above labeling process, we decide the systematic component of the code word — 
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# Bit node labeled as an Information bit 
Bit node labeled as a parity bit 
□ Checit node 



Fig. 1. Left: A Tanner graph. Right: Labehng bit nodes on the Tanner graph shown on the left. 

{xi X2 xs X4 xs xe X'j x% x\q) Xo ~s — {xi X2 ^3 x^ xq Xg) and the parity component 
to be ^ = (a;4 x-j xiq). The label-and-decide encoding on the code in Figure 1 then has the 
following steps: 

Step 1. Get the values of the information bits xi, X2, x^, x^, xq, xs, and xg from the encoder input; 
Step 2. Compute the parity bit x^ from the parity check equation Ci : X4 = xi (B X2 (B xs; 
Step 3. Compute the parity bit X7 from the parity check equation C2: xj = X4 X5 (B xq; compute 
the parity bit a: 10 from the parity check equation C3: Xio = ® © Xg. 
In fact, any tree code (whose Tanner graph is cycle-free) can be encoded in linear complexity 
by the label-and-decide algorithm. We will prove this fact in Corollary 2 in Section V. Further, 
the label-and-decide algorithm can be used to encode a particular type of Tanner graphs with 
cycles, i.e., the pseudo-tree we propose in the next section. 

IV. Pseudo-tree 

A pseudo-tree is a connected Tanner graph that satisfies the following conditions (Al) through (A4). 
(Al) It is composed of 2P+ 1 tiers where P is a positive integer. We number these tiers from 1 
to 2P + 1, starting from the top. The (2i — l)-th tier (i — 1,2, . . . P + 1) contains only bit 
nodes, while the (2i)-th tier (i — 1,2, ... P) contains only check nodes. 
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(A2) Each bit node in the first tier has degree one and is connected to one and only one check 
node in the second tier. 

(A3) For each check node Cq in the (2i)-th tier, where i can take any value from 1 to P, there 
is one and only one bit node Xa in the (2i — l)-th tier (immediate upper tier) that connects 
to Ca, and there are no other bit nodes in the upper tiers that connect to Cq. We call Xa 
the parent of Ca and Ca the child of Xa- 
(A4) For each bit node m the (2z — l)-th tier, where i can take any value from 2 to P, there 
is at most one check node Cp in the (2z)-th tier (immediate lower tier) that connects to x^, 
and there are no other check nodes in the lower tiers that connect Xo xp. 
For example. Figure 2 shows a pseudo-tree with seven tiers. It contains many cycles. Each 
check node Cj in the pseudo-tree is connected to a unique bit node in the immediate upper tier, 
while each bit node in the pseudo-tree may connect to multiple check nodes in the upper 
tiers. 

An important characteristic of a pseudo-tree is that it can be encoded in linear complexity by 
the label-and-decide algorithm. This is proved in the following lemma. 

Lemma 1 Any LDPC code whose Tanner graph is a pseudo-tree is linear time encodable. 

Proof: Let a pseudo-tree contain 2P + 1 tiers, N bit nodes, and M check nodes. Condition (A3) 
guarantees that each check node is connected to one and only one parent bit node in the immediate 
upper tier. Condition (A4) guarantees that different check nodes are connected to different parent 
bit nodes. Therefore, there are M parent bit nodes for the check nodes. We label these M parent 
bit nodes as parity bits and the other A'^ — M bit nodes as information bits. 

The inputs of the encoder provide the values for all the information bits. The task of the 
encoder is to compute the values for all the parity bits. Let Xa be an arbitrary parity bit in the 
{2i — l)-th tier. By conditions (A3) and (A4), there is only one check node Ca in the lower 
tiers that connects to Xa- The value of Xa is uniquely determined by the parity check equation 
represented by Ca- According to condition (A3), all the bit nodes constrained by Ca except 
for Xa are in tiers below the {2i — l)-th tier. Therefore, the value of Xa depends only on the 
values of the bit nodes below the (2i — l)-th tier. For example, as shown in Figure 2, parity 
bit xg is the parent bit node of the check node Cq. From the parity check equation Cq, we see 
that the value of xg is computed from the values of xn, xu, xie, and xis, which are located 
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# Bit node labeled as an information bit 
O Bit node labeled as a parity bit 
□ Check node 

Fig. 2. A pseudo-tree. 

below xg. We compute the values of the parity bits tier by tier, starting from the (2P — l)-th 
tier (bottom tier) and then progressing upwards. Each time we compute the value of a parity 
bit, we only need the values of those bits (both information bits and parity bits) in lower tiers, 
which are already known. Hence, this encoding process can proceed. The encoding process is 
repeated until the values of all the parity bits in the first tier are known. 

We evaluate the computation complexity of the above encoding process. Let ki,i — 1,2, . . . ,M, 
denote the number of bits contained in the i-th parity check equation. The i-th parity check 
equation determines the value of a parity bit with {ki — 2) XOR operations. So, Xli!li(^«~2) XOR 
operations are required to obtain all the M parity bits. Let k = denote the average 

number of bits in the M parity check equations, then the encoding complexity is 0{M{k — 2)). 
For LDPC codes with uniform row weight k, the encoding complexity is 0{M{k — 2)). The 
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above analysis shows that the encoding process is accomplished in linear time. This completes 
the proof. □ 
We look at an example. We encode the pseudo-tree in Figure 2 as follows: 
Step 1. Determine the values of all the information bits 0:14, ^15, xie, xio, X12, X13, x^, xj, and xg; 
Step 2. Compute the parity bit xu from the parity check equation C-j : Xu — a;i4 © Xi^ © Xi%; 
Step 3. Compute the parity bit Xg from the parity check equation C5 : = Xiq © Xis © Xu © Xia; 

compute the parity bit Xg from the parity check equation Cg : Xg = Xn © 2:12 © © Xia; 
Step 4. Compute the parity bits xi, X2, ^3, and a;4 in the first tier by the parity check equations 
Ci, C2, C3, and C4 respectively: xi — x^q © ^5 © x-j, X2 — X5 ® xq ® X12 © xg, x^ — 

^5 © © Xu © Xu © Xs, X4 — Xq ® Xg ® Xio © Xg © X13. 

The above encoding process requires only 21 XOR operations. 

V. Encoding Stopping Set 
An encoding stopping set in a Tanner graph is a connected subgraph such that: 
(Bl) If a check node C is in an encoding stopping set, then all its associated bit nodes and the 

edges that are incident on C are also in the encoding stopping set. 
(B2) Any bit node in an encoding stopping set is connected to at least two check nodes in the 
encoding stopping set. 

(B3) All the check nodes included in an encoding stopping set are independent of each other, 
i.e., any parity check equation can not be represented as the binary sums of other parity 
check equations. 

The number of check nodes in an encoding stopping set is called its size. If a connected Tanner 
graph satisfies conditions (Bl) and (B2) but not condition (B3), we call this Tanner graph a 
pseudo encoding stopping set. For example, the Tanner graph shown in Figure 3 is not an 
encoding stopping set but a pseudo encoding stopping set since it satisfies conditions (Bl) 
and (B2) but not condition (B3). The Tanner graph shown in Figure 4 is an encoding stopping 
set. Its size is 9. Every bit node in this encoding stopping set has degree greater than or equal 
to 2, and every check node is independent of each other. Please note that the "encoding stopping 
set" defined in this paper is different from the "stopping set" defined in [16]. Stopping sets are 
used for the finite-length analysis of LDPC codes on the binary erasure channel, while encoding 
stopping sets are used here to develop efficient encoding methods for LDPC codes. From the 
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Fig. 3. A pseudo encoding stopping set. 

above definitions of pseudo-tree and encoding stopping set, we have the following lemma. 

Lemma 2 Any pseudo-tree or union of pseudo-trees does not contain encoding stopping sets. 

The proof of Lemma 2 is straightforward. We omit it here. 

We will show next that the label-and-decide algorithm can not successfully encode encoding 
stopping sets. 

Theorem 1 An encoding stopping set can not be encoded successfully by the label-and-decide 
algorithm- 
Proof: Let £j be an encoding stopping set and suppose £j can be encoded successfully by the 
label-and-decide algorithm. Let Xa be the last parity bit being determined during the encoding 
process. Since £f is an encoding stopping set, x^, is connected to at least two check nodes Cp 
and C^i by condition (B2). Further, by condition (B3), all the check nodes in Sj, including Cfj 
and are independent of each other. Hence, for certain encoder inputs, Cp and provide 
different values for the parity bit x^. This contradicts the fact that every parity bit can be uniquely 
determined successfully by the label-and-decide algorithm. Hence, the label-and-decide algorithm 
can not encode an encoding stopping set. This completes the proof. □ 
Conversely, if a Tanner graph does not contain any encoding stopping set, there must exist a 
linear complexity encoder for the corresponding code. 

Theorem 2 If a Tanner graph Q does not contain any encoding stopping set, then it can be 
encoded in linear time by the label-and-decide algorithm. 

Proof : We first delete all redundant check nodes (i.e., dependent on other check nodes) from 
the Tanner graph Q. Next, we restrict our attention to the case that ^ is a connected graph. We 
will show that Q can be equivalently transformed into a pseudo-tree if it is free of any encoding 
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Fig. 4. An encoding stopping set whose proper subgraph is a pseudo-tree shown in Figure 2. 

Stopping set. Since Q does not contain any encoding stopping set, Q itself is not an encoding 
stopping set. Hence, there exist some degree-one bit nodes in Q. We generate a multi-layer 
graph T and place those degree-one bit nodes in the first tier of T. Next, the check nodes that 
connect to the degree-one bit nodes in the first tier of T are placed in the second tier of T. 
Notice that there exist at least one bit node Xa in Q\T such that Xa connects to at most one 
check node in Q\T. This statement is true. Otherwise, Q\T becomes an encoding stopping set, 
which contradicts the fact that Q does not contain any encoding stopping set. We pick all the 
bit nodes in Q\T that connect to at most one check node in Q\T and place them in the third 
tier of T. Correspondingly, those check nodes in Q\T that connect to the bit nodes in the third 
tier of T are placed in the fourth tier of T. Each time we find bit nodes in Q\T that connect 
to at most one check node in G\T, we place those bit nodes in a new tier 2s + 1 of T and 
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X, X, ^3 




Fig. 5. Left: A multi-layer structure but not a pseudo-tree. (Note that C4 has two parents xq and 0:7 and Ci has two parents x\ 
and X2-) Right: The pseudo-tree that evolves from the multi-layer structure shown on the left. 

place the check nodes connecting to those bit nodes in the following new tier 2s + 2 of T. We 
continue finding such bit nodes and increasing tiers till all the nodes in Q are included in T. Up 
to now, the multi-layer structure constructed so far satisfies the conditions (Al), (A2), and (A4). 
Condition (A3) may fail to be satisfied. For example, as shown on the left in Figure 5, the 
check node C4 in tier 4 is connected to two bit nodes x% and x-j in tier 3, which contradicts 
condition (A3). To satisfy condition (A3), we further adjust the positions of the bit nodes. If 
a check node in tier 2i is connected to k bit nodes in the upper tiers of T, we pick one bit 
node in tier 2i — 1 from these k bit nodes and leave its position unchanged. Next, we drag 
the other A; — 1 bit nodes from their initial positions in tier 2i — 1 to the (2i + l)-th tier. To 
illustrate, let us focus on Figure 5 again. We drag the bit node x-j from tier 3 to tier 5 and 
drag the bit node x\ from tier 1 to tier 3. The newly formed graph is shown on the right in 
Figure 5, which follows condition (A3). By tuning the positions of the bit nodes in this way, 
the resulting hierarchical graph satisfies conditions (Al) to (A4). In this way, we transform Q 
into a pseudo-tree. By Lemma 1, a pseudo-tree is linear time encodable. Therefore, the encoding 
complexity of Q is 0{M) where M denotes the number of independent check nodes contained 
in g. 

We now prove the case that ^ is a disjoint graph. Let Q contain p connected subgraphs: 
^1, ^2, • • • , Qp- By the above analysis, the complexity of encoding Qi is C(Mj), i = 1, 2, . . . ,p. 
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where Mj denotes the number of independent check nodes contained in Qi. Since ^ = U ^2 U 
...UGp, then the encoding complexity of G is C»(Mi) = ^(ELi ^0 = ^(^) where M 
is the number of independent check nodes in Q. This completes the proof. □ 
From Theorem 2, we easily derive the following corollaries. 

Corollary 1 If a Tanner graph does not contain any encoding stopping set, then it can be 
represented by a union of pseudo-trees. 

The proof of Corollary 1 can be found in the proof of Theorem 2. 

Corollary 2 The label-and-decide algorithm can encode any tree LDPC codes (whose Tanner 
graphs are cycle-free) with linear complexity. 

Proof: Let T be the Tanner graph of a tree LDPC code and <S be an arbitrary subgraph of T. 
Since the Tanner graph T is a tree, its subgraph S is either a tree or a union of trees. Therefore, 
the graph S contains at least one bit leaf node with degree one. Since the graph S contains 
a degree-one bit node, S can not be an encoding stopping set. Since no subgraph of T is an 
encoding stopping set, by Theorem 2, the tree code T can be encoded in linear complexity by 
the label-and-decide algorithm. This completes the proof. □ 

Corollary 3 A regular LDPC code with column weight 2 (cycle code) can be encoded in linear 
complexity by the label-and-decide algorithm. 

Proof: We prove Corollary 3 by showing that a cycle code does not contain any encoding 
stopping set. Assume the cycle code contains an encoding stopping set 8j. By the definition 
of cycle code and condition (B2), all the bit nodes in £j have uniform degree two. It follows 
that the binary sum of all the parity check equations in £j is a vector of O's. Then, at least one 
check node in £j is dependent on the other check nodes. This contradicts condition (B3) that 
all the check nodes in an encoding stopping set are independent of each other. Hence, a cycle 
code does not contain any encoding stopping set. By Theorem 2, a cycle code is linear time 
encodable by the label-and-decide algorithm. This completes the proof. □ 
An alternative proof can be found in [4]. 
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Corollary 4 Let H be the parity check matrix of an LDPC code. If H can be transformed into 
an upper triangular matrix U by row and column permutations, then the LDPC code can be 
encoded in linear time by the label-and-decide algorithm. 

Proof: We label the rows of the upper triangular matrix U one by one as Ci, C2, . . ., Cm, from 
the bottom to the top, as shown in Figure 6. We notice that if i > j, then there exists at least 
one bit that is contained in Ci but not in Cj. Assume the Tanner graph of the code contains an 
encoding stopping set Ej that contains check nodes Q^, Ci^, . . ., Ci^. Let q — max(ii, ^2, . . . ,ip). 
There exists at least one bit node Xq in Sj that only connects to Cq. This contradicts the fact 
that every bit node in an encoding stopping set is connected to at least two check nodes in 
the encoding stopping set. Hence, Sj is not an encoding stopping set. Since the Tanner graph 
of the LDPC code does not contain any encoding stopping set, by Theorem 2 it is linear time 
encodable. This completes the proof. □ 
Theorems 1 and 2 show that encoding stopping sets prevent the application of the label-and- 
decide algorithm. However, we will show in the next section that encoding stopping sets can 
also be encoded in linear complexity. 

VI. Linear complexity encoding approach for encoding stopping sets 

Let Sf be an encoding stopping set. We say £j is a k-fold-constraint encoding stopping set if 
the following two conditions hold. 
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(CI) There exist k check nodes Ci, C2, . . ., Ck in Sj such that Sj\{Ci, C2, ■ ■ ■ , Ck} does not 
contain any encoding stopping set. We call the k check nodes Ci, C2, ■ ■ ., Ck key check 
nodes. 

(C2) For any k — 1 check nodes Ci, C2, ■ ■ ., Ck-i in £j, £j\{Ci, C2, . . . , Ck-i} contains an 
encoding stopping set. 

The notation £j\{Ci,C2, ■ ■ ■ ,Ck} denotes the remaining graph after deleting check nodes Ci, 
C2, . . ., Ck from £f. Figure 4 shows a 2-fold-constraint encoding stopping set. After deleting 
the two key check nodes Cg and Cg from this encoding stopping set, the Tanner graph turns into 
a pseudo-tree, see Figure 2. We will focus on 1 -fold-constraint and 2-fold-constraint encoding 
stopping sets in this paper, since we will show later that all types of LDPC codes can be 
decomposed into 1-fold or 2-fold constraint encoding stopping sets and pseudo-trees. 

Let us first look at a 2-fold-constraint encoding stopping set £j of size M. By definition, there 
exist two key check nodes Cq and in Sj such that £f\{Ca, Cp} does not contain any encoding 
stopping set. We encode £j in three steps. In the first step, we encode £f\{Ca, Cfs} using the 
label-and-decide algorithm according to Theorem 2. During encoding, M — 2 bit nodes are 
labeled as parity bits and the remaining bit nodes are labeled as information bits. In the second 
step, we verify the two key check nodes Ca and Cp based on the bit values acquired in Step 1. 
The key check nodes Ca and Cp indicate that two bits and xs that were previously labeled as 
information bits are actually parity bits, and their values are determined by Ca and Cp. We call 
the two bits x^ and xs reevaluated bits. The bits x^ and X5 satisfy the following three conditions. 

(Dl) x-y is constrained by the parity check equation Cq. 

(D2) Xs is constrained by the parity check equation Cp. 

(D3) x-y and xg are not both contained in Ca and Cp. 
Since Ca, Cg, and the other check nodes in Sj are independent of each other, there must exist bit 
nodes x^ and xs that satisfy conditions (Dl) to (D3). An algorithm for finding reevaluated bits x^ 
and Xs from a 2-fold-constraint encoding stopping set is presented in Appendix A. Notice that 
the work load to find the two key check nodes and the two reevaluated bits is a preprocessing 
step that is carried out only once. Assume in step 1 that x^ and xs are randomly assigned 
initial values x° and x^, respectively. If the parity check equations Ca and C/3 are both satisfied, 
the initial values and x^ are the correct values for x^ and xs. If Ca, or Cp, or both, are 
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not satisfied, we need to recompute tlie values of and from tlie values of the key check 
nodes Cq, and C^. Let Ax-y — x^ — x^ and Aa;^ — xs — x^^ where x^ and £5 are the correct 
values of x^ and xs, respectively, and let Cq, and Cp be the values of the key check nodes Cq, 
and Cfj, respectively. If x^ is contained in both Cq and Cp, and xs is only contained in C^, we 
derive the following equations. 

^ ^ (7) 
Ax-y © Axs = -Cfj = C/s 

If Xs is contained in both Cq and Cp, and Xj is only contained in Cq, we derive the following 
equations. 

AX^ ® AXS = -Cq = Cq 

^ — (o) 

Ax^ = -Cp = C/3 

If x,y is only contained in Cq and xs is only contained in Cp, we have the following equations. 

Z Z (9) 
Axs = -Cf3 = Cp 

From equations (7) to (9), we can get the correct values of x^ and xs- In the third step, we 
recompute those parity bits that are affected by the new values of and x^. This encoding 
method is named label-decide-recompute and is described in Algorithm 2. 

Next, we analyze the computation complexity of the label-decide-recompute algorithm when 
encoding a 2-fold-constraint encoding stopping set. Every check node except for the two key 
check nodes Cq and Cp are computed at most twice in the label-decide-recompute encoding 
(label-and-decide step and recompute step) while the two key check nodes Cq and Cp need 
to be computed only once. In addition, we need one extra XOR operation to compute the two 
reevaluated bits and xs by equations (7) to (9). Hence, the encoding complexity of the label- 
decide-recompute algorithm is less than or equal to 2 • X^i^i ^(^i ~ 2) + (/cq — 1) + {kp — 1) + 1 
where A;^, 1 < i < M — 2, are the degrees of the check nodes other than Cq, Cp and ka, kp 
are the degrees of the check nodes Cq and Cp, respectively. The encoding complexity of the 
label-decide-recompute algorithm can be further simplified to be less than 2 • M • (A; — 1) where 
M is the number of check nodes in the encoding stopping set and k is the average number 
of bit nodes contained in each check node in the encoding stopping set. This shows that the 
label-decide-recompute algorithm encodes any 2-fold-constraint encoding stopping set in linear 
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Algorithm 2 Label-decide-recompute algorithm for a 2-fold-constraint encoding stopping set £j 

of size M 

Preprocessing (carry out only once): 

Find two check nodes Cq, and Cp such that £j\{Ca, Cp} does not contain any encoding 
stopping set; 

Using Algorithm 8 to pick two information bits and xs that satisfy conditions (Dl) to (D3). 
Determine the parity bits Xp^, Xp^, . . ., Xp^ that are affected by the values of x^ and xs; 
Encoding: 

Fill the values of the information bits except for x^ and xs; 
Assign Xj — and xs — 0; 

Encode £j\{Ca, Cp} using Algorithm 1. Compute the values of the M — 2 parity bits; 
Compute the values Ca and C/s of the key check nodes Cq and Cp, respectively; 
if ^ or ^ then 

Recompute the values of x^ and xs from Ca and by equations (7) to (9); 

for i = 1 to s do 

Recompute the value of the parity bit Xp. based on the new values of Xj and xs; 
end for 
end if 

Output the encoding result. 



time. The pre-processing (determining key check nodes, reevaluated bits, and parity bits affected 
by the reevaluated bits) is done offline and does not count towards encoder complexity. 

We look at an example. Figure 4 shows a 2-fold-constraint encoding stopping set £j. After 
deleting the two check nodes Cg and Cg, Sj becomes the pseudo-tree shown in Figure 2. In 

addition, the value of the bit node X5 affects Cg and the value of the bit node xg affects Cg. 
Hence, the two bits X5 and xg are reevaluated bits. We use the label-decide-recompute algorithm 
to encode Sj as follows. 

Step 1. Assign x^ — and xg = 0. Encode the pseudo-tree part following the procedures on 
page 10. 

Step 2. Compute the values of the key check nodes Cg and Cg, e.g., Cg = xi © X2 © xs © X4 and 
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Cg = Xi (B X2 (B Xg (B XiQ. 

Step 3a. If Cg = and Cg — 0, stop encoding and output the codeword [xi X2 ■ ■ ■ xiq\. 
Step 3b. If = 1 or Cg = 1, recompute the values of x^ and xg as follows: X5 — Cg and 
xs — Cg, where Cg and Cg are the values of the parity check equations Cg and Cg, 
respectively. Recompute the parity bits xi, X2, X3, and X4 based on the new values 
of X5 and xg. Output the codeword [xi X2 ■ ■ ■ xiq]. 
The label-decide-recompute algorithm can be further simplified. We restudy the third step of 
the label-decide-recompute method. Assume pi, p2, . . ., Pm are the parity bits whose values 
need to be updated. In order to get the new values of the parity bits pi, p2, ■ ■ ., Pm> we need 
to recompute those parity check equations that relate to pi, p2, . . ., Pm- In fact, instead of 
recomputing the parity check equations relating to pi, p2, • • Pm, we can directly flip the values 
of the parity bits pi, p2, . . ., Pm since in the binary field the value of a bit is either or 1. 
For example, if the correct value of x^ is different from its original value we simply flip 
the values of those parity bits that are affected by x^. We name the above encoding method 
label-decide-flip and describe it in Algorithm 3. The encoding complexity of Algorithm 3 is 
M ■ (k — l) XOR operations plus two vector flipping operations. We, again, look at an example. 
The 2-fold-constraint encoding stopping set shown in Figure 4 can be encoded by Algorithm 3 
as follows. 

Step 1. Assign x^ = and xg — 0. Encode the pseudo-tree part following the procedures on 
page 10. 

Step 2. Compute the values of the parity check equations Cg and Cg, e.g., Cs — xi ® X2 ® xs ® X4 
and Cg — xi ® X2 ® xg ® xiq. 
Step 3a. If Cg = and Cg = 0, stop encoding and output the codeword [xi X2 . ■ ■ xiq\. 
Step 3b. If Cg = 1 or Cg = 1, recompute the values of x^ and Xg as the following: X5 = Cg 
and xs — Cg where Cg and Cg are the values of the parity check equations Cg and Cg, 
respectively. If X5 — 1, flip the vector [xi X2 xs] to be [~ xi ~ a;2 ~ xs\. If xg = 1, 
flip the vector [^3 X4]. If a;5 ® xg = 1, flip the vector [^4]. Output the codeword 
[xi X2 ... xie]. 

It is easy to revise Algorithm 2 and Algorithm 3 to encode a 1 -fold-constraint encoding 
stopping set. For example. Algorithm 4 shows the label-decide-recompute algorithm for a 1- 
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Algorithm 3 Label-decide-flip algorithm for a 2-fold-constraint encoding stopping set 8j of 

size M 

Preprocessing (carry out only once): 

Find two check nodes Cq, and such that £/\{Cq,,C^} does not contain any encoding 
stopping set; 

Using Algorithm 8 to pick two information bits and xs that satisfy conditions (Dl) to (D3). 
Determine the parity bits x^^, Xu^, ■ ■ ■, that are affected by the value of Xj and group Xu^, 
Xu2, ■ ■ ; in a vector — [xu^ Xu2 ■ ■ ■ Xu^]- Determine the parity bits Xp^, Xp^, . . ., Xp^ that 
are affected by the value of xs and group Xp^, Xp^, . . ., Xp^ in a vector Xs = [xp-^ Xp^ ... XpJ. 
Determine the parity bits Xq-^, Xg.^, . . ., Xg^ that are affected by the value of and group 

Xqij Xq2i ■ ■ Xq^ in a vector X^^ \Xqi *^g2 ■ ■ ■ *^5t]' 

Encoding: 

Fill the values of the information bits except for and xs; 
Assign Xj — and xg — 0; 

Encode £j\{Ca,C/j} using Algorithm 1. Compute the values of the M — 2 parity bits; 
Compute the values Ca and of the parity check equations Cq, and Cp, respectively; 
it C',, ^ or ^ then 

Recompute the values of x^ and xs from Cq, and Cp by equations (7) to (9); 

if x-y = 1 then 
Flip the vector X^. 

end if 

if ^5 = 1 then 

Flip the vector Xg. 
end if 

if Xj (B Xs — I then 

Flip the vector X^. 
end if 
end if 

Output the encoding result. 
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fold-constraint encoding stopping set. The encoding complexity of Algorithm 4 is less than 2 ■ 
M ■ {k — 1) where M is the number of check nodes in the encoding stopping set and k is the 
average number of bit nodes contained in each check node in the encoding stopping set. 

Algorithm 4 Label-decide-recompute algorithm for a 1 -fold-constraint encoding stopping set Sj 

of size M 

Preprocessing (carry out only once): 

Find a check node C* such that Sj\C* does not contain any encoding stopping set. 
Pick an information bit x* that is constrained by the parity check equation C*. 
Determine the parity bits Xp^, Xp^, . . ., Xp^ that are affected by x*. 
Encoding: 

Fill the values of the information bits except for x*. 

Assign X* = 0. 

Encode £j\C* using Algorithm 1, compute the values of the M — 1 parity bits. 
Verify the parity check equation C*. 
if the parity check equation C* is not satisfied then 
X* ^ 1. 

for i = 1 to s do 

Recompute the value of the parity bit Xp. based on the new value of x*; 
end for 
end if 

Output the encoding result. 



VII. Linear complexity encoding for general LDPC codes 

In this section, we propose a linear complexity encoding method for general LDPC codes. We 
will show that any Tanner graph can be decomposed into pseudo-trees and encoding stopping 
sets that are 1 -fold-constraint or 2-f old-constraint. By encoding each pseudo-tree or encoding 
stopping set using Algorithm 1 or Algorithm 2, we achieve linear time encoding for arbitrary 
LDPC codes. 

To proceed, we provide the following definition. Given a Tanner graph Q and its subgraph S, 
we call the bit nodes in Q but not in S the outsider nodes of S. For example. Figure 7 shows a 
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Tanner graph Q and its subgraph S. Since in Figure 7 the two bit nodes Xi and X2 are in Q but 
not in S, xi and X2 are outsider nodes of S. The check node C2 contains two outsider nodes 
of S, i.e., is connected to two outsider nodes. The check node Ci contains zero outsider nodes 
of S. 



Fig. 7. Outsider nodes. 

We start from LDPC codes with maximum column weight 3 by proving the following lemma. 

Lemma 3 Assume the maximum bit node degree of a Tanner graph Q is three, then one of the 
following statements must be true. 

(El) There are no pseudo encoding stopping sets or encoding stopping sets in Q. 

(E2) There exists a pseudo encoding stopping set in Q. All the bit nodes in the pseudo encoding 

stopping set have uniform degree 2. 
(E3) There exists a 1 -fold-constraint or a 2-fold-constraint encoding stopping set in Q. 

Proof: We only need to prove either condition (E2), or condition (E3), is true if Q contains a 
pseudo encoding stopping set, or an encoding stopping set, respectively. We prove this statement 
by constructing a subgraph S from the Tanner graph Q. Initially S is empty. We pick a check 
node Ci from Q that contains the smallest number of bit nodes. Next, we add Ci and all the bit 
nodes contained in C\ to S. We keep adding check nodes and their associated bit nodes to S 
till S contains a pseudo encoding stopping set or an encoding stopping set. Each time we add a 
check node to S, we always pick the check node that contains the fewest outsider nodes of S. 
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If S contains an encoding stopping set, we also add all the check nodes in Q\S that contain 
zero outsider nodes to S. Next, we discuss two different cases. 

S contains an encoding stopping set. Assume S contains k check nodes and the j-th added 
check node Cj is the last check node that introduces outsider nodes to S. We will show that 
k — j < 2. Assume Cj adds p outsider nodes Xj^, Xj^, . . ., xj^ to S. We will prove that the 
(j + l)-th added check node Cj+i connects to all the p bit nodes xj-^, xj,^, . . ., xj^. If 
does not connect to all the p bit nodes, then C^+i contains a smaller number of outsider nodes 
than Cj and should be added earlier than Cj since we always pick the check node that contains 
the smallest number of outsider nodes and add it first to S. This contradicts the fact that Cj+i 
is added to S after Cj. Therefore, C^+i should connect to all the p bit nodes x^^, Xj^, ■ ■ ., Xj^. 
Similarly, Cj+2, ■ ■ ■, Ck connect to all the p bit nodes x^^, Xj^, ■ ■ ., Xj^. Since any bit node 
can connect to at most three check nodes, it follows that k — j <2, which means at most two 
check nodes are added to S after Cj. Notice that S does not contain any encoding stopping set 
before adding Cj+i. Hence, the encoding stopping set in S is either a 1 -fold-constraint encoding 
stopping set or a 2-fold-constraint encoding stopping set. Condition (E2) is satisfied. 

<S is a pseudo encoding stopping set. It follows that the binary sum of all the check nodes 
in S is zero. So, the degree of every bit node in S is an even number. Since the maximum bit 
node degree is three, the degree of each bit node in S is two. Condition (E3) is satisfied. 

This completes the proof. □ 

We detail the method of determining a pseudo encoding stopping set or an encoding stopping 
set in Algorithm 5. 

Theorem 3 Let Q be the Tanner graph of an LDPC code. If the maximum bit node degree of Q 
is three, then the LDPC code can be encoded in linear time and the encoding complexity is 
less than 2 ■ M ■ {k — 1) where M is the number of independent check nodes in Q and k is the 
average number of bit nodes contained in each check node. 

Proof: If the Tanner graph Q does not contain any encoding stopping set, then the corresponding 
LDPC code can be encoded in linear time by Theorem 2. Therefore, we only need to prove 
Theorem 3 for the case that Q contains encoding stopping sets. Since the maximum bit node 
degree of Q is three, by Lemma 3 there exists a pseudo encoding stopping set or an encoding 
stopping set ^1 in ^. If ^1 is a pseudo encoding stopping set, we simply delete a redundant 
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Algorithm 5 Find a pseudo encoding stopping set or an encoding stopping set (1 -fold-constraint 
or 2-f old-constraint) from a Tanner graph Q with maximum bit node degree 3. 

Flag ^ 0. 
i = 1. 

while Flag = and Q S do 

Find a check node Q in Q\S that contains the smallest number of outsider nodes of S. 
Add Ci and all its associated outsider nodes to S. 
if Ci does not introduce new bit nodes to S then 

while there exists a bit node x of degree one in A do 

Delete the degree-one bit node x and the check node connecting to x from A. 
end while 
it A — 4> then 

S does not contain any pseudo encoding stopping set or encoding stopping set. 
else 

Flag ^ 1. 
end if 
end if 
i = i + l. 
end while 
if Flag — 1 then 

if all the bit nodes in A are of degree 2 then 

The subgraph ^ is a pseudo encoding stopping set. 
else 

The subgraph A is an encoding stopping set. 
end if 
Output A. 
else 

The Tanner graph Q does not contain pseudo encoding stopping sets or encoding stopping 
sets, 
end if 
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check node from Qi and Gi becomes a pseudo-tree. If Qi is an encoding stopping set, it is either 
a 1 -fold-constraint or a 2-fold-constraint encoding stopping set by Lemma 3. 

Next, we look at the subgraph We first transform the parity check equations in 

into generalized parity check equations by moving the bits contained in Qi from the left-hand 
side of the equation to the right-hand side of the equation. Let a parity check equation C contain 
k bit nodes xi, X2, . . ., Xk where the bits Xq+i, . . ., Xk are also in Qi, then the parity check 
equation C can be rewritten as 

Xl e X2 © . . . e Xfe = =^ Xl e X2 e . . . © = Xq+i © Xq+2 © . . . © Xfe = & (10) 

In equation (10), h becomes a constant after we encode Qi and get the values of all the bits 
in Qi. Since the maximum bit node degree of Q\Qi is less than or equal to three, we, again, find 
a pseudo encoding stopping set or an encoding stopping set Q2 from Q\Qi. If Q2 is an encoding 
stopping set, Q2 is either a 1 -fold-constraint or a 2-fold-constraint encoding stopping set by 
Lemma 3. If ^2 is a pseudo encoding stopping set and we assume Q2 contains the following 
m generalized parity check equations, 

Xl,! © Xi,2 © ... Xi^ai = bi 

X2,l © X2,2 © ■■■ X2,a^ =62 ^^^^ 
Xm,l ® Xjji^2 © • • • 37m, bm 

we derive that 

61 © 62 © • • • © ^'m = (12) 

Hence, we can replace any generalized parity check equation in (11) by the new parity check 
equation (12). From the above analysis, we can delete any check node from Q2 to make Q2 a 
pseudo-tree. To maintain the code structure, we also generate a new check node C* that represents 
the parity check equation (12). Since the parity check equation (12) only contains bits in Qi, we 
add the new check node C* to Qi and regenerate encoding stopping sets or pseudo-trees in the 
graph Gi U C*. 

Generally, we can find a pseudo encoding stopping set or an encoding stopping set Qi^i from 
the subgraph Q\{Gi U ^2 U . . . U Qi}. If Qi+i is an encoding stopping set, Qi^i is either a 1- 
fold-constraint or a 2-fold-constraint encoding stopping set by Lemma 3. If Qi+i is a pseudo 
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encoding stopping set, we operate in three steps. In the first step, we sum up all the generalized 
parity check equations in Qi+i to generate a new parity check equation C*. In the second step, 
we delete one check node from Qi+i to make Gi+i a pseudo-tree. In the third step, we add 
the new check node C* to Gi and regenerate pseudo-tree or encoding stopping sets in GiL) C*. 
Notice that the new parity check equation C* in (12) does not incur extra cost to compute 
variables bi, 62, • • since these variables have already been computed in those generalized 
parity check equations in Qi^i, as shown in (11). Practically, we can compute these variables 
61, 62, . . ., bm only once and store them. Later, we can apply the stored values bi, 62, . . ., bm 
to both equation (12) and equation (11). Hence, the new parity check equation C* only needs 
m — 1 additional XOR operations to compute the summation of bi, 62 > ■ ■ ■, bm- Since the cost 
of encoding the pseudo-tree Gi+i is {m — 1) ■ {k — 2) where k is the average degree of the 
remaining m — 1 check nodes in Gi+i, the overall cost of encoding Qi+i and the new parity 
check equation C* is (m — 1) ■ (/c — 1). 

By continuing to find pseudo-tree or encoding stopping sets in this way, we reach the stage 
where U ^2 U . . . U Qi} — (j) or Q\{Qi U ^2 U . . . U Qi} does not contain pseudo encoding 

stopping sets or encoding stopping sets. 

By the above analysis, we decompose the Tanner graph Q into a sequence of p subgraphs 
Qi, Q2, ■ ■ ; Qp where Qu 1 < i < p, is either a 1 -fold-constraint encoding stopping set, a 
2-fold-constraint encoding stopping set, or a pseudo-tree. If Qi is a 1 -fold-constraint or a 2- 
fold-constraint encoding stopping set, we apply Algorithm 2 or Algorithm 4 to encode Qi and 
the resulting encoding complexity is less than 2 ■ ■ {ki — 1) where Mi denotes the number 
of independent check nodes in Qi and ki denotes the average number of bit nodes contained 
in each check node in Qi. If Qi is a pseudo-tree, we apply Algorithm 1 to encode Qi and the 
corresponding encoding complexity is less than Mj • [ki — 1). The overall computation complexity 
of encoding Q is linear on the number of independent check nodes M in ^ and is bounded by 
^^^^(2 ■ Mi ■ {ki — 1)) = 2 ■ M ■ (A; — 1) where k denotes the average number of bits contained 
in each independent check node of Q. This completes the proof. □ 

We summarize the algorithm of decomposing a Tanner graph with maximum bit node degree 3 
into pseudo-trees and encoding stopping sets in Algorithm 6 and the algorithm to encode such 
LDPC codes in Algorithm 7. 

Next, we extend the linear time encoding method described in Theorem 3 to LDPC codes 



October 15, 2008 



DRAFT 



27 



Algorithm 6 Decompose a Tanner graph Q with maximum bit node degree 3 into 1 -fold- 
constraint encoding stopping sets, 2-fold-constraint encoding stopping sets, and pseudo-trees. 

Find a pseudo encoding stopping set or an encoding stopping set Qi from Q using Algorithm 5. 

g = g\gi. 

if Qi is a pseudo encoding stopping set then 

Delete a check node in Qi. Qi becomes a pseudo-tree, 
end if 
i = 1. 

while there exists a pseudo encoding stopping set or an encoding stopping set in Q do 
Find a pseudo encoding stopping set or an encoding stopping set Qi+i from Q using 
Algorithm 5. Assume Qi+i contain m check nodes Ci, C2, . . . , Cm- 

Q = g\G.+i 

if Qi_^_l is a pseudo encoding stopping set then 
Si+i — Si+i\Cm- Qi+1 becomes a pseudo-tree. 
Generate a new check node C* — Ci® C2® ■ ■ ■ ® Cm- 
Add C* to Gi and regenerate pseudo-trees and encoding stopping sets in U C*. 
end if 
i = i + l. 
end while 
Qi+i = Q- 

Output a sequence of subgraphs Gi, Q2, ■ ■ ■, Qp where Qi, I < i < p, is either a pseudo-tree 

or an encoding stopping set (1 -fold-constraint or 2-fold-constraint.) 



with arbitrary column weight and row weight. 

Theorem 4 Any LDPC code C with arbitrary column weight distribution and row weight dis- 
tribution can be encoded in linear time, and the encoding complexity is less than 4 • M ■ (A: — 1) 
where M is the number of independent check nodes in C and k is the average degree of check 
nodes. 

Proof: We first show that an LDPC code with arbitrary column weight distribution and row 
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Algorithm 7 linear complexity encoding algorithm for LDPC codes with maximum bit node 
degree 3 

Preprocessing (carry out only once): 

Apply Algorithm 6 to decompose the Tanner graph Q of the code into p subgraphs Qi, Q2, ■ ■ 

Qp where Qi, i = 1,2, ... ,p, is either a pseudo-tree or a 1 -fold-constraint encoding stopping 

set or a 2-fold-constraint encoding stopping set. 

Encoding: 

for z = 1 to p do 

Compute the constants on the right-hand side of the generalized parity check equations 
of Gi based on the already known bit values of Qi, Q2, ■ ■ ■, Si-i- 
if Qi is a pseudo-tree then 

Encode Qi using Algorithm 1. 
else 

Encode Qi using Algorithm 2. 
end if 
end for 

Output the encoded codeword. 




Fig. 8. Transform a bit node of degree 4 into two bit nodes of degree 3 and an auxiliary check node. 

weight distribution can be equivalently transformed into an LDPC code with maximum column 
weight three. For example. Figure 8 on the left shows a bit node x of degree 4. It can be split 
into two bit nodes x and x' of degree 3 and an auxiliary check node C, as shown on the right in 
Figure 8. The auxiliary check node C is represented as x®x' = 0, which means x is equivalent 

to x'. Originally the bit node x connects to four check nodes Ci, C2, C3, and C4. After node 
splitting, x' connects to C\, C2, and x connects to C3, C4. Hence, the Tanner graph on the left 
in Figure 8 is equivalent to the Tanner graph on the right in Figure 8. Similarly, a bit node of 
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Fig. 9. Transform a bit node of degree 5 into three bit nodes of degree 3 and two auxiliary check nodes. 




^k-3 ^1 ^[ 



Fig. 10. Transform a bit node of degree k into — 2 bit nodes of degree 3 and & — 3 auxiliary check node. 

degree 5 can be split into three bit nodes x, x', and x" and two auxiliary check nodes C and C", 
as shown in Figure 9. Generally, a bit node of degree k can be equivalently transformed into 
^ — 2 bit nodes of degree 3 and A; — 3 auxiliary check nodes, as shown in Figure 10. Assume 
an LDPC code C contains M check nodes and N bit nodes. The M check nodes have degrees 
ki, /c2, • • •, kM, respectively. The bit nodes have degrees ji, j2, ■ ■ ■, Jn, respectively. Among 
the bit nodes in C, there are s bit nodes whose degrees are greater than 3 and their degrees 
are ji, j2, . . ., js- This LDPC code can be equivalently transformed into another LDPC code C 
with maximum column weight 3. The new code C has M + {Yli=i ji ~ 3s) check nodes and 
^ + {Yli=i ji ~ 3s) bit nodes. By Theorem 3, the LDPC code C can be encoded in linear time 
and the encoding complexity is less than 2- M' ■ {k' —1). where M' is the number of independent 
check nodes in C and k' is the average degree of independent check nodes in C. Since there 
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are J2l^i ji — 3s auxiliary check nodes in C that have degree 2, we derive that 
2 • M' • (F - 1) = 2 

< 2 




< 2 • • (A; - 1) + [ J] fci 
= 4-M-(k-l) 



M 



(13) 



Therefore, the overall computation cost of encoding C is less than 4 • M • (A; — 1). As the LDPC 
code C is equivalent to the LDPC code C, the complexity of encoding C is less than 4 - M - (/c — 1). 
This completes the proof. □ 
Let's look at an example. The parity check matrix of a (13,26) LDPC code with column 
weight 3 is shown in (14). Assume the values of the 13 information bits are 0, 1, 1, 1,0, 1, 1, 
0, 0, 1, 0, 1, and 1. We apply the proposed linear complexity encoding method to encode this 
code. 

10000000100010000000111000 
00010001000100010100001000 
00100000001000100001000011 
00000011010001000000100100 
00000100000000100011000011 
00001010000110001000001000 
00001000100011010100000000 
00100100001000000010000111 
01010010000000010000010100 
01000000110101000100000000 
00100100001000100011000000 
11000001000000001000100000 
10011000010000001000010000 

(14) 

Preprocessing. We construct an encoding stopping set from (14) using Algorithm 5. We start 
from an empty graph S and add check nodes and their associated bit nodes to <S. Each time 
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we add a check node, we always pick the check node that contains the smallest number of 
outsider nodes of S. After adding 7 check nodes, the resulting graph is a pseudo-tree, as shown 
in Figure 11. When 9 check nodes are considered, we get the 2-fold-constraint encoding stopping 
set £i shown in Figure 12. The bits Xg, Xis, X22, X23, x^, Xig, X4, X12, xn are information 
bits. The two check nodes Cio and C12 are key check nodes, and the two bit nodes xi and xis 
are reevaluated bits. 

After finding the encoding stopping set Si, the remaining Tanner graph of the code can be 
constructed to be a 2-fold-constraint encoding stopping set £2, as shown in Figure 13. Therefore, 
the LDPC code can be partitioned into two encoding stopping sets Si and S2 that are shown in 
Figure 13. The bits xn, X15, Xig, X25 in the encoding stopping set S2 are information bits. The 
two check nodes C3 and C5 are key check nodes of S2, and the two bit nodes xs and xe are 
reevaluated bits of £^2- 

Encoding. 
Encode Si. 

Step 1. Fill the values of the information bits, i.e., [xg X13 X22 X23 X5 Xie X4 X12 xn] = [0 1 1 

10 110 0]. Assign xi = and xis = 0. 
Step 2. Encode the pseudo-tree shown in Figure 11. Compute the parity bits 0:21, xu, xs, X7, xw, 

X24, X2 as follows. 



2^21 = 


xi 9 


3 Xg © 


Xi3 © 


•X22 e 


B X23 -- 


= 1 


Xu = 


Xg 9 




BX5® 


Xl6 9 


B xis -- 


= 


Xs = 


2^23 


© Xl6 


®Xi8 


©0:4 


© X12 


= 1 


X7 = 


2^13 ' 


®X23 


©^5^ 


B X12 


®Xi7 


= 


XlO = 


xi a 


5 a;22 a 


DX5 © 


X4 © 


Xn = 


: 


X24 = 


2^21 ' 


© Xu 


©Xgc 


Bxj^ 


B XlO -- 


= 


X2 = 


X22 


© X24 


© XlQ 


®Xj 


© X4 - 


= 1 



Step 3. Compute the values of the parity check equations Cio and C12. Cio — X2®xg® xio © xu © 

3^14 ® Xis = 1> and C12 = xi ffi ^2 ffi © xn © X21 = 1. 
Step 4. Since Cio — 1 and C12 = 1, the correct values of xi and xis are xi — Cio — 1 and 

Xis — C12 — 1. 
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Step 5. Recompute the parity bits X21, x^, xs, and xio based on the new values of xi and xig. We 
derive that 0:21 — 0, 0:14 — 1, xg — 0, xw — I. 

Encode £2: 

Step 6. Fill the values of the information bits, i.e., [xn X15 xig X25] = [101 1]. Assign X3 — 
and xq — 0. 

Step 7. Compute the parity bits 2:20, 2:26 as follows. 

2^20 = ® ® Xn ® Xi5 © = 

2^26 = X3® Xe® Xn © xig © a;25 © X24 = 1 

Notice that the value of the parity bit X2g is based on the value of the bit X24 in £1. 
Step 8. Compute the values of the parity check equations C3 and C5. C3 — X3® X20 © xn ® 2:15 © 

3^26 © 2:25 = 1, and C5 = X(i © X20 © .Ti5 © X'lg © ,r26 © :2;25 = 1- 

Step 9. Since C3 = 1 and C5 = 1, the correct values of 2:3 and x^ are X3 = C3 = 1 and xq = C5 = 1. 
The encoded codeword is 

[Xg Xis X22 X23 Xq XiQ X4 X12 Xn Xi Xis X21 Xi4 Xg X-j XiQ X24 X2 Xn Xig X25 X3 Xq X20 X26] 

[0 111011001101001011011110 1] 

VIII. Conclusion 

This paper proposes a linear complexity encoding method for general LDPC codes by ana- 
lyzing and encoding their Tanner graphs. We show that two particular types of Tanner graphs- 
pseudo-trees and encoding stopping sets can be encoded in linear time. Then, we prove that any 
Tanner graph can be decomposed into pseudo-trees and encoding stopping sets. By encoding 
the pseudo-trees and encoding stopping sets in a sequential order, we achieve linear complexity 
encoding for arbitrary LDPC codes. The proposed method can be applied to a wide range of 
codes; it is not limited to LDPC codes. It is applicable to both regular LDPC codes and irregular 
LDPC codes. It is also good for both "low density" parity check nodes and "medium-to-high 
density" parity check nodes. In fact, the proposed linear time encoding method is applicable to 
any type of block codes. It removes the problem of high encoding complexity for all long block 
codes that historically are commonly encoded by matrix multiplication. 
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Fig. 11. A pseudo-tree built from the LDPC code described in (14). 

Appendix A 

Finding reevaluated bits and xs in a 2-fold-constraint encoding stopping 

SET 

The details are described in Algorithm 8. 
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Algorithm 8 Finding reevaluated bits and xs in a 2-fold-constraint encoding stopping set 
Represent the two key check equations Cq and Cp as functions of the information bits only. 

Assume Cq, is associated with q information bits x^^, x^j, • • •, x^^ and Cp is associated with 

p information bits xp^, xp^, . . ., xp^. 

Flag ^ 0. 

for i = 1 to g do 

if Xai is contained in Cq but not in Cp then 

Flag ^ 1. 

Choose the reevaluated bit Xj to be Xj — . . 
exit the for loop, 
end if 
end for 

if Flag — 1 then 

Choose the reevaluated bit xs to he xs — xp^. 
else 

Choose the reevaluated bit x^ to be x^ — Xa^. 
for z = 1 to p do 

if xp^ is contained in Cp but not in Ca then 
Choose the reevaluated bit xs to he xg — xp.. 
exit the for loop, 
end if 
end for 
end if 

Output the two chosen reevaluated bits x^ and xs. 
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